Automatic Ticket Assignment

Problem Statement:

Manual assignment of incidents is time consuming and requires human efforts. There may be mistakes due to human errors and resource consumption is carried out ineffectively because of the misaddressing. On the other hand, manual assignment increases the response and resolution times which result in user satisfaction deterioration / poor customer service.

Additional effort needed for Functional teams to re-assign to right functional groups. During this process, some of the incidents are in queue and not addressed timely resulting in poor customer service. Guided by powerful AI techniques that can classify incidents to right functional groups can help organizations to reduce the resolving time of the issue and can focus on more productive tasks.

Objective:

The objective of the project is :

Learn how to use different classification models. Use transfer learning to use pre-built models. Learn to set the optimizers, loss functions, epochs, learning rate, batch size, checkpointing, early stopping etc. Read different research papers of given domain to obtain the knowledge of advanced models for the given problem. This capstone project intends to reduce the manual intervention of IT operations or Service desk teams by automating the ticket assignment process.The goal here is to create a text classification based ML model that can automatically classify any new tickets by analysing ticket description to one of the relevant Assignment groups, which could be later integrated to any ITSM tool like Service Now. Based on the ticket description our model will output the probability of assigning it to one of the 74 Groups.

Preparing the Google Drive

Loading the relevent Libraries

Load and Understand the Data

Observation:

The excel sheet contains 4 columns of ticket details which have total of 8500 rows.

  1. Short description --> Contains 8492 records, i.e there are around 8 null records.
  2. Description --> Contains 8499 records, i.e there are around 1 null records.
  3. Caller --> Doesn't have any blank data
  4. Assignment group --> Doesn't have any blank data

Observation:

Data Cleansing

We will perform the stated actions -

Observations :

Exploratory Data Analysis

Grouping by Categories

Distribution by group

Observation:

Description length analysis and its relation with Assignment group

Observation:

The distribution of description lengths is extremely skewed, let us try to omit the tail and focus the major portion from the right skewed graph.

Observation:

Let us consider till the 95th percentile for further visualization.

Observation:

There is no clear pattern visible to describe the relation between group and description lengths.

It is however visible that groups have outliers and some groups fall in the lower range while most of them have lengths in the range of 100 to 400.

Observation:

There is a clear that the data of GRP_2 has very long description data. Other group records have mean length lesser than 500 words.

Text Pre-processing

Stemming and Lemmatization are Text Normalization (or sometimes called Word Normalization) techniques in the field of Natural Language Processing that are used to prepare text, words, and documents for further processing. In grammar, inflection is known as the modification of a word to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, and mood. An inflection expresses one or more grammatical categories with a prefix, suffix or infix, or another internal modification such as a vowel change.

Stemming

Stemming is the process of reducing inflection in words to their root forms such as mapping a group of words to the same stem even if the stem itself is not a valid word in the Language.

Lemmatization

Lemmatization, unlike Stemming, reduces the inflected words properly ensuring that the root word belongs to the language. In Lemmatization, root word is called Lemma. A lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words.

Top Unigrams

Word Cloud

A word cloud is a collection, or cluster, of words depicted in different sizes. The bigger and bolder the word appears, the more often it’s mentioned within a given text and the more important it is.

Also known as tag clouds or text clouds, these are ideal ways to pull out the most pertinent parts of textual data, often also help business users compare and contrast two different pieces of text to find the wording similarities between the two.

Let's write a generic method to generate Word Clouds

Comments:

It's indicative from the n-gram analysis and the word cloud is that the entire dataset speaks more about issues around

Analysis on GRP_0 which is the most frequent group to assign a ticket to reveals that this group deals with mostly the maintenance problems such as password reset, account lock, login issue, ticket update etc.

Maximum of the tickets from GRP_0 can be reduced by self correcting itself by putting automation scripts/mechanisms to help resolve these common maintenance issues. This will help in lowering the inflow of service tickets thereby saving the person/hour efforts spend and increasing the business revenue.

Building the Models

We will proceed towards trying different models alogrithms mentioned below so as to classify the problem and validate the best modeling technique:

Now we'll create another column of categorical datatype from Assignment groups. Also we'll be writing some generic methods to plot evaluation metrics.

Naive Bayes

K-Nearest Neighbor (KNN)

Support Vector Machine (SVM)

Decision Trees

Random Forest Classifier

Bidirectional LSTM

Observation:

All the above models listed below are higly overfitted as the training accuracy is on higher side and testing accuracy is on lower side :-

Summary - Initial Report

Amongst all the model architectures we've tried, the accuracy of each of the model is as follows in the table. Statistical models are overfitted to a higher degree. One obvious reason is the dataset is highly imbalanced.

Following are some of the techniques we'll be trying in Milestone-2 as part of fine tuning.

image.png

Milestone 2 - Fine tune

Dealing with Imbalanced dataset - Approach 1 - Split the data into Training, Validation and Test data sets

Modeling with Dataset split into Training, Validation and Test data sets

Multinomial Naive Bayes

Bidirectional LSTM

Dealing with Imbalanced dataset - Approach 2 - Resampling technique (Upsampling)

Label Encoding 'Assignment group' target class

Modeling with resampled data

Multinomial Naive Bayes

Support Vector Machine (SVM)

Decision Trees

Random Forest

Bidirectional LSTM

Final Conclusion

We first analysed the dataset provided to us, undestood the structure of the data provided - number of columns, field , datatypes etc.

We first analysed the dataset provided to us, undestood the structure of the data provided - number of columns, field , datatypes etc.

We did Exploratory Data Analysis to derive further insights from this data set and we found that

  • Data is very much imbalanced, there are around ~45% of the Groups with less than 20 tickets.
  • Few of the tickets are in foreign language like German
  • The data has lot of noise in it, for eg- few tickets related to account setup are spread across multiple assignment groups.

We performed the data cleaning and preprocessing:

  • Make text all lowercase so that the algorithm does not treat the same words in different cases as different
  • Removing Noise i.e everything that isn’t in a standard number or letter i.e Punctuation, Numerical values
  • Removing extra spaces
  • Removed punctuations
  • Removed words containing numbers
  • Stopword Removal: Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely. These words are called stop words
  • Lemmatization
  • Tokenization: Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens i.e words that we actually want. Sentence tokenizer can be used to find the list of sentences and Word tokenizer can be used to find the list of words in strings.

We then trained various supervised machine learning models using the cleaned and preprocessed dataset.

We trained the dataset using below models:

  • Multinomial NB
  • K Nearest Neighbors
  • Linear Support vector Machine
  • Decision Tree
  • Random Forest
  • Bidirectional LSTM

We were able to attain much greater accuracy after we upsampled the data of other groups during method 2. Snapshot of the same is attached for your reference.

image.png

This shows that a properly cleaned and preprocessed data holds powers to propel even simpler models to race ahead of complex models and beat them in accuracy and other metric parameters.

As we saw that the accuracy that we got after implementing models on upsampled dataset clearly indicates that the models are biased

So we will try to consider a different approach in handling imbalanced dataset, which we have clearly mentioned in the interim report and mentioning the points here as well

Future Approach

1. We will try to reduce the number of classes from 16 to optimal number using K-means clustering ( Elbow Method )

2. We will try to translate non English words into English.

3. We will try to handle our imbalanced dataset using up sampling and down sampling.

4. We will use SMOTE technique for handling imbalanced dataset

5. We will implement 2 deep learning models as well for better inference.